-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ResolveGroupingAnalytics analyzer rule #5749
Conversation
I am not an expert in GROUPING sets / etc. I wonder if someone else has the time to review this PR? |
@yjshen Could you please help to review this PR ? |
It looks like no one has reviewed this PR, and I'm not an expert about GROUP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mingmwang , It's a excellent job!
match self { | ||
AggregateFunction::GroupingId => { | ||
write!(f, "GROUPING_ID") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look like we don't need it
@@ -110,6 +110,28 @@ async fn csv_query_group_by_boolean() -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A future ticket: add some test in slt
@@ -2856,8 +2856,8 @@ fn aggregate_with_rollup() { | |||
let sql = | |||
"SELECT id, state, age, COUNT(*) FROM person GROUP BY id, ROLLUP (state, age)"; | |||
let expected = "Projection: person.id, person.state, person.age, COUNT(UInt8(1))\ | |||
\n Aggregate: groupBy=[[GROUPING SETS ((person.id), (person.id, person.state), (person.id, person.state, person.age))]], aggr=[[COUNT(UInt8(1))]]\ | |||
\n TableScan: person"; | |||
\n Aggregate: groupBy=[[person.id, ROLLUP (person.state, person.age)]], aggr=[[COUNT(UInt8(1))]]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -202,6 +202,77 @@ impl PartialEq<dyn Any> for UnKnownColumn { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Hash, PartialEq, Eq, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let mut group_data = group | ||
.iter() | ||
.enumerate() | ||
.map(|(idx, is_null)| { | ||
if *is_null { | ||
null_exprs_value[idx].clone() | ||
} else { | ||
exprs_value[idx].clone() | ||
} | ||
}) | ||
.collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can extract a function named get_group_data
.
group | ||
.iter() | ||
.enumerate() | ||
.map(|(idx, is_null)| { | ||
if *is_null { | ||
null_exprs_value[idx].clone() | ||
} else { | ||
exprs_value[idx].clone() | ||
} | ||
}) | ||
.collect::<Vec<_>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use function get_group_data
/// Hidden grouping set expr in the grouping set | ||
hidden_grouping_set_expr: Vec<(Arc<dyn PhysicalExpr>, String)>, | ||
/// Distinct result expr for the grouping set, used to generate output schema | ||
result_expr: Vec<(Arc<dyn PhysicalExpr>, String)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question.
Is result_expr
a part of grouping_set_expr
?
If yes, maybe we can use a index vec to point grouping_set_expr
if !expr.contains_hidden_columns() | ||
&& cols.iter().all(|c| group_expr_columns.contains(c)) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hiddent columns
may influent optimizer rule, it may cause some bug.
We should add more test to cover it in the following PR.
@@ -1364,6 +1423,8 @@ fn create_name(e: &Expr) -> Result<String> { | |||
"Create name does not support qualified wildcard".to_string(), | |||
)), | |||
Expr::Placeholder { id, .. } => Ok((*id).to_string()), | |||
Expr::HiddenColumn(_, c) => Ok(format!("#{}", c)), | |||
Expr::HiddenExpr(first, _) => Ok(format!("#{}", first)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should HiddenExpe
be Ok(format!("{}", first))
? look like #
is wrong.
|
||
pub struct ResolveGroupingAnalytics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add some comment to explain this rule resolve what problem.
What is the status of this PR? It has accumulated non trivial conflicts. Shall we mark it as a draft ? |
Yes, please mark it as a draft. |
Since this has been open for more than a year, closing it down. Feel free to reopen if/when you keep working on it. |
Which issue does this PR close?
Closes #5672
Closes #5649
Closes #5647.
Some TPC-DS(q27, q36, q70, q86)queries will be impacted by this PR also. Those queries include the
GROUPING()
functionsand the actually physical plans were not able to evaluate before this PR.
Rationale for this change
What changes are included in this PR?
enumerate_grouping_sets
logic to this ruleGROUPING()
andGROUPING_ID()
functions.Are these changes tested?
Are there any user-facing changes?